Modeling the French presidential elections of 2022 with CoDa tools

Authors

Lukas Dargel

Christine Thomas-Agnan

Published

March 8, 2024

Modified

March 8, 2024

This vignette accompanies our article:

Dargel, Lukas, and Christine Thomas-Agnan (2024). Pairwise share ratio interpretations of compositional regression models. Computational Statistics & Data Analysis, Volume 195, July 2024, 107945 Accessed 08 Mar 2024. Online at https://doi.org/10.1016/j.csda.2024.107945

The vignette contains all the code necessary to reproduce the article’s results and provides additional illustrations.

Code
# cran dependencies
library("colorspace")
library("CoDaImpact")
library("compositions")
library("data.table")
library("here")
library("ggplot2")
library("kableExtra")
library("skimr")
library("stringr")
library("zCompositions")

# local functions
save_baseplot <- function(device, ..., which){
  suppressMessages({
    dev.copy(device, ..., which)
    dev.off()
    invisible(TRUE)
  })
}

# options
options(
  knitr.kable.NA = '',
  scipen = 9,
  warn = 1)

# data
mun_elec2census <- readRDS(here("out/data/mun_elec2census.Rds")) # election data combined with the census
mun_elec <- readRDS(here("in/data/mun_elec.Rds")) # original election data 

1 Introduction

We illustrate the use of compositional data analysis (CoDa) to analyze the results of political elections using the example of the French presidential elections of 2022. Our primary focus is the illustration of mathematical tools that enable new ways to interpret CoDa regression models. Therefore, other issues, such as variable selection and zeros handling, are treated more briefly.

2 Descriptive statistics

The study combines the official election results with census data:

2.1 Data overview

Let us have a first look at the combined data set.

Code
skim(mun_elec2census)
Data summary
Name mun_elec2census
Number of rows 34820
Number of columns 28
Key ID_MUN
_______________________
Column type frequency:
character 3
numeric 25
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ID_MUN 0 1 5 5 0 34820 0
NAME_MUN 0 1 1 45 0 32597 0
NAME_DEP 0 1 3 23 0 96 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
ABSTENTIONS 0 1 310.15 2230.35 0.00 31.00 70.00 181.00 296668.00 ▇▁▁▁▁
BLANK 0 1 14.97 80.08 0.00 2.00 5.00 12.00 11028.00 ▇▁▁▁▁
INVALID 0 1 6.48 38.74 0.00 1.00 2.00 5.00 5267.00 ▇▁▁▁▁
ARTHAUD_Nathalie 0 1 5.38 23.20 0.00 0.00 2.00 5.00 2891.00 ▇▁▁▁▁
ROUSSEL_Fabien 0 1 22.78 126.89 0.00 2.00 6.00 17.00 17267.00 ▇▁▁▁▁
MACRON_Emmanuel 0 1 269.64 2294.94 0.00 29.00 68.00 178.00 372820.00 ▇▁▁▁▁
LASSALLE_Jean 0 1 31.17 107.58 0.00 6.00 13.00 29.00 12139.00 ▇▁▁▁▁
LE_PEN_Marine 0 1 227.81 777.14 0.00 36.00 83.00 197.00 72772.00 ▇▁▁▁▁
ZEMMOUR_Eric 0 1 69.17 581.52 0.00 8.00 18.00 46.00 86088.00 ▇▁▁▁▁
MELENCHON_Jean_Luc 0 1 208.91 2104.10 0.00 18.00 43.00 113.00 317372.00 ▇▁▁▁▁
HIDALGO_Anne 0 1 16.92 144.04 0.00 1.00 4.00 12.00 22901.00 ▇▁▁▁▁
JADOT_Yannick 0 1 45.07 495.50 0.00 3.00 9.00 27.00 80268.00 ▇▁▁▁▁
PECRESSE_Valerie 0 1 46.67 419.01 0.00 6.00 13.00 32.00 69564.00 ▇▁▁▁▁
POUTOU_Philippe 0 1 7.45 41.47 0.00 1.00 2.00 6.00 5732.00 ▇▁▁▁▁
DUPONT_AIGNAN_Nicolas 0 1 20.05 81.22 0.00 3.00 7.00 17.00 9591.00 ▇▁▁▁▁
C19_POP15P_CS1 0 1 11.82 17.86 0.00 0.00 5.17 15.08 536.25 ▇▁▁▁▁
C19_POP15P_CS2 0 1 54.18 436.78 0.00 5.00 15.13 40.36 68343.41 ▇▁▁▁▁
C19_POP15P_CS3 0 1 148.16 3186.69 0.00 5.00 19.83 55.70 558207.31 ▇▁▁▁▁
C19_POP15P_CS4 0 1 218.08 1845.06 0.00 19.44 49.77 135.00 269298.92 ▇▁▁▁▁
C19_POP15P_CS5 0 1 245.49 1743.72 0.00 21.78 56.31 149.89 225097.04 ▇▁▁▁▁
C19_POP15P_CS6 0 1 185.08 864.01 0.00 20.61 55.00 135.00 78049.73 ▇▁▁▁▁
C19_POP15P_CS7 0 1 418.71 2564.74 0.00 50.00 110.99 281.25 345027.83 ▇▁▁▁▁
C19_POP15P_CS8 0 1 255.37 2551.22 0.00 18.20 44.66 116.70 327076.11 ▇▁▁▁▁
AREA 0 1 15.76 17.32 0.03 6.51 10.96 18.94 757.42 ▇▁▁▁▁
POPULATION 0 1 1862.19 15018.55 1.00 197.00 454.00 1147.00 2175601.00 ▇▁▁▁▁

2.2 Compare losses due to preprocessing

We lose some of the registered voters in France because the census data only covers individuals living in mainland France. Our analysis does not cover the overseas territories and voters living in a foreign country. Let us compare the original and combined voting data to assess the losses and potential biases.

Code
rename_elec <- c(
  MACRON_Emmanuel = "Macron",
  LE_PEN_Marine = "Le Pen",
  MELENCHON_Jean_Luc = "Mélenchon",
  LEFT = "Left",
  RIGHT = "Right",
  NON_VOTE = "No vote",
  ARTHAUD_Nathalie = "Arthaud",
  ROUSSEL_Fabien = "Roussel",
  LASSALLE_Jean = "Lasalle",
  ZEMMOUR_Eric = "Zemmour",
  HIDALGO_Anne = "Hidalgo",
  JADOT_Yannick = "Jadot",
  PECRESSE_Valerie = "Pécresse",
  POUTOU_Philippe = "Poutou",
  DUPONT_AIGNAN_Nicolas = "Dupont-Aignan",
  ABSTENTIONS = "Abstention",
  BLANK = "Blank",
  INVALID = "Invalid",
  EXPRESSED = "Expressed",
  REGISTERED = "Registered")

elec_cols <- setdiff(names(mun_elec), c("ID_MUN", "NAME_MUN", "NAME_DEP"))
pct <- function(x) scales::percent(x,accuracy = .1)
summarize_elec <- function(dt) {
  
  dt <- data.table(
    Municipalities = nrow(dt),
    Departements = length(unique(dt$NAME_DEP)),
    dt[,lapply(.SD, sum), .SDcols = elec_cols]
  )
  
  
  dt[,EXPRESSED := NA]
  dt[,REGISTERED := sum(.SD), .SDcols = elec_cols]
  dt[,EXPRESSED := REGISTERED - ABSTENTIONS - BLANK - INVALID]
  
  show_dt <- data.table(t(dt),keep.rownames = TRUE)
  
  dont_use <- c("Municipalities","Departements")
  show_dt[!rn %in% dont_use,
          "% of REGISTERED" := pct(V1/dt[["REGISTERED"]])]
  
  dont_use <- c(dont_use, "ABSTENTIONS", "BLANK", "INVALID", "REGISTERED")
  show_dt[!rn %in% dont_use,
          "% of EXPRESSED" := pct(V1/dt[["EXPRESSED"]])]
  names(show_dt)[1:2] <- c("Indicator", "Count")
  show_dt
}  

elec_original <- summarize_elec(mun_elec)
elec_combined <- summarize_elec(mun_elec2census)
elec_combined <- cbind(Loss = elec_original[["Count"]] - elec_combined[["Count"]], elec_combined[,-1])
elec_combined <- cbind("% Loss" = pct(elec_combined[["Loss"]]/elec_combined[["Count"]]), elec_combined)
elec_original[Indicator %in% names(rename_elec),Indicator := rename_elec[Indicator]]
kable(cbind(elec_original, elec_combined)) |> 
  kable_classic(full_width = F) |> 
  add_header_above(c(" " = 1, "Full election data" = 3, "Combined election data" = 5)) |> 
  pack_rows(group_label = "Units", 1,2,hline_after = TRUE) |> 
  pack_rows(group_label = "Not expressed", 3,5,hline_after = TRUE) |> 
  pack_rows(group_label = "Expressed", 6,17,hline_after = TRUE) |> 
  pack_rows(group_label = "Totals", 18,19,hline_after = TRUE) |> 
  column_spec(1, bold = T) 
Full election data
Combined election data
Indicator Count % of REGISTERED % of EXPRESSED % Loss Loss Count % of REGISTERED % of EXPRESSED
Units
Municipalities 35245 1.2% 425 34820
Departements 107 11.5% 11 96
Not expressed
Abstention 12824169 26.3% 18.7% 2024709 10799460 23.8%
Blank 543609 1.1% 4.3% 22512 521097 1.1%
Invalid 247151 0.5% 9.6% 21577 225574 0.5%
Expressed
Arthaud 197094 0.4% 0.6% 5.1% 9630 187464 0.4% 0.6%
Roussel 802422 1.6% 2.3% 1.1% 9075 793347 1.7% 2.3%
Macron 9783058 20.1% 27.8% 4.2% 394102 9388956 20.7% 27.8%
Lasalle 1101387 2.3% 3.1% 1.5% 15913 1085474 2.4% 3.2%
Le Pen 8133828 16.7% 23.2% 2.5% 201602 7932226 17.5% 23.5%
Zemmour 2485226 5.1% 7.1% 3.2% 76651 2408575 5.3% 7.1%
Mélenchon 7712520 15.8% 22.0% 6.0% 438354 7274166 16.0% 21.5%
Hidalgo 616478 1.3% 1.8% 4.6% 27205 589273 1.3% 1.7%
Jadot 1627853 3.3% 4.6% 3.7% 58654 1569199 3.5% 4.6%
Pécresse 1679001 3.4% 4.8% 3.3% 53839 1625162 3.6% 4.8%
Poutou 268904 0.6% 0.8% 3.7% 9502 259402 0.6% 0.8%
Dupont-Aignan 725176 1.5% 2.1% 3.9% 27093 698083 1.5% 2.1%
Totals
Expressed 35132947 72.1% 100.0% 3.9% 1321620 33811327 74.5% 100.0%
Registered 48747876 100.0% 7.5% 3390418 45357458 100.0%

2.3 Check for problems with zeros

Because CoDa models cannot effectively handle cases where dependent or independent compositional variables have zero proportions, it is necessary to address the zeros issue before estimating the models. To address this, we will initially employ amalgamations and impute the remaining zero values by a small constant. The amalgamation process combines various professional categories or political candidates into larger blocks. However, it is important to note that our proposed grouping is ad hoc, and a case study of electoral sociology should certainly place more thought into these actions.

2.3.1 In the vote data

We first check for zeros in the dependent variables.

Code
zero_summary <- function(dt) dt[,lapply(.SD, function(x) pct(sum(x==0)/.N))]
t(zero_summary(mun_elec2census[,..elec_cols]))
                      [,1]   
ABSTENTIONS           "0.0%" 
BLANK                 "11.6%"
INVALID               "24.2%"
ARTHAUD_Nathalie      "26.4%"
ROUSSEL_Fabien        "8.5%" 
MACRON_Emmanuel       "0.1%" 
LASSALLE_Jean         "2.0%" 
LE_PEN_Marine         "0.1%" 
ZEMMOUR_Eric          "1.7%" 
MELENCHON_Jean_Luc    "0.5%" 
HIDALGO_Anne          "13.3%"
JADOT_Yannick         "5.9%" 
PECRESSE_Valerie      "2.2%" 
POUTOU_Philippe       "20.5%"
DUPONT_AIGNAN_Nicolas "6.7%" 

To reduce the number of zero votes, we group the minor candidates into a “left” and a “right” block, each containing four candidates. Additionally, we group invalid votes, blank votes and abstentions into the “No vote” block. After these amalgamations, we obtain a visual impression of the zero patterns in the new composition using the zCompositions::zPatterns() function. The graph reveals that almost \(98.5\%\) of the municipalities have non-zero compositions now.

Code
left_bloc <- c("ARTHAUD_Nathalie", "ROUSSEL_Fabien", "HIDALGO_Anne", "JADOT_Yannick", "POUTOU_Philippe")
right_bloc <- c("ZEMMOUR_Eric", "PECRESSE_Valerie", "LASSALLE_Jean", "DUPONT_AIGNAN_Nicolas")

VOTE <- cbind(
  mun_elec2census[,c("MACRON_Emmanuel", "LE_PEN_Marine", "MELENCHON_Jean_Luc")],
  LEFT = as.integer(rowSums(mun_elec2census[,..left_bloc])),
  RIGHT = as.integer(rowSums(mun_elec2census[,..right_bloc])),
  NON_VOTE = as.integer(rowSums(mun_elec2census[,c("ABSTENTIONS", "INVALID","BLANK"),])))

colnames(VOTE) <- rename_elec[colnames(VOTE)]

vote_names <- rename_elec[colnames(VOTE)]
zPatterns(
  VOTE,
  bar.labels = TRUE,
  label="0")

Patterns ('+' means 0, '-' means No 0) 

 Patt.ID Macron Le Pen Mélenchon Left Right No vote No.Unobs Patt.Perc
       1      -      -         -    -     -       -        0     98.67
       2      -      -         -    -     -       +        1      0.01
       3      -      -         -    -     +       -        1      0.02
       4      -      -         -    +     -       -        1      0.67
       5      -      -         -    +     -       +        2      0.01
       6      -      -         +    -     -       -        1      0.30
       7      -      -         +    +     -       -        2      0.12
       8      -      -         +    +     -       +        3      0.00
       9      -      +         -    -     -       -        1      0.05
      10      -      +         -    +     -       -        2      0.01
      11      -      +         -    +     +       -        3      0.00
      12      -      +         +    -     -       -        2      0.00
      13      +      -         -    -     -       -        1      0.07
      14      +      -         -    -     -       +        2      0.00
      15      +      -         -    +     -       -        2      0.01
      16      +      +         -    -     -       -        2      0.01
      17      +      +         +    +     +       -        5      0.04

Percentage cells by component 
   Macron    Le Pen Mélenchon      Left     Right   No vote 
     0.13      0.11      0.46      0.86      0.06      0.03 

Overall percentage cells: 0.28% 

The remaining zeros are imputed by simply adding one vote to all candidates (or blocks) in each municipality with at least one zero. This imputation affected 464 municipalities, adding an overall number of 2784 fictive voters.

Code
is_zero_row <- 0 != rowSums(VOTE == 0) 
VOTE_IMP <- VOTE + is_zero_row

cbind(
  sum(is_zero_row),
  sum(is_zero_row) * ncol(VOTE),
  sum(VOTE_IMP) - sum(VOTE))
     [,1] [,2] [,3]
[1,]  464 2784 2784

2.3.1.1 Before and after imputation

We compare the data distribution in the original and imputed data sets to ensure that the imputation does not distort our analysis. Comparing the two distributions below reveals no noticeable changes in the mean but a slight increase in the variance. The effect on the variance is not surprising since our imputation method replaces zeros (red bars in the graphic) with points close to the edge of the triangle, which are extreme values in the simplex sense.

Code
simplex_isodensity <- function(
  data,
  quantiles = c(0.5,1:9,9.5)/10,
  labels = names(cen),
  col1 = "black",
  col2 = "grey45",
  plot_data = TRUE) {
  
  # code adapted from:
  # Van den Boogaart and Tolosana-Delgado (2013), page 52
  
  stopifnot(ncol(data) == 3)
  cen <- mean(data)
  var <- var(data)
  if (plot_data) 
    plot(data, pch = ".", col = "grey85", labels = labels, mp = NULL, lenMissingTck = 0.02)
  plot(cen,pch=4, col = col1, labels = labels, add = plot_data)
  for (p in quantiles) {
    r = sqrt(qchisq(p=p,df=2))
    ellipses(cen,var,r, col=col2)
  }
}
Code
opar <- par(mar=c(0,3,0,3), mfrow = c(1,2))
simplex_isodensity(acomp(VOTE_IMP[,1:3]))
title(sub = "Imputed", outer = FALSE)
simplex_isodensity(acomp(VOTE[,1:3]))
title(sub = "Original")
par(opar)

Code
opar <- par(mar=c(0,3,0,3), mfrow = c(1,2))
simplex_isodensity(acomp(VOTE_IMP[,-(1:3)]))
title(sub = "Imputed", outer = FALSE)
simplex_isodensity(acomp(VOTE[,-(1:3)]))
title(sub = "Original")
par(opar)

2.3.2 In the professional categories (PC) data

The professional categories data directly comes from the French population census provided by the INSEE. It categorizes the population above 15 into one of eight groups. The categories correspond to

  • C19_POP15P_CS1: Agriculteurs exploitants
  • C19_POP15P_CS2: Artisans, Commerçants, Chefs d’entreprise
  • C19_POP15P_CS3: Cadres et Professions intellectuelles supérieures
  • C19_POP15P_CS4: Professions intermédiaires
  • C19_POP15P_CS5: Employés
  • C19_POP15P_CS6: Ouvriers
  • C19_POP15P_CS7: Retraités
  • C19_POP15P_CS8: Autres sans activité professionnelle
Code
pc_cols <- names(mun_elec2census)
pc_cols <- pc_cols[grep("^C19_", pc_cols)]
t(zero_summary(mun_elec2census[,..pc_cols]))
               [,1]   
C19_POP15P_CS1 "30.3%"
C19_POP15P_CS2 "16.5%"
C19_POP15P_CS3 "16.4%"
C19_POP15P_CS4 "4.6%" 
C19_POP15P_CS5 "3.1%" 
C19_POP15P_CS6 "3.6%" 
C19_POP15P_CS7 "0.5%" 
C19_POP15P_CS8 "4.5%" 

Since the number of municipalities having zero share of some of these categories is too large, we amalgamate the eight original groups into four broader categories. Using the new categorization, about \(95\%\) of the municipalities have no zero value in all components.

Code
PROFCAT <- cbind(
  "Upper"   = mun_elec2census[,C19_POP15P_CS3 + C19_POP15P_CS4],
  "Workers" = mun_elec2census[,C19_POP15P_CS5 + C19_POP15P_CS6],
  "Retired" = mun_elec2census[,C19_POP15P_CS7],
  "Other"   = mun_elec2census[,C19_POP15P_CS1 + C19_POP15P_CS2 + C19_POP15P_CS8]
)

zPatterns(
  PROFCAT,
  label="0",
  bar.labels = TRUE,
  axis.labels = c(NA,NA))

Patterns ('+' means 0, '-' means No 0) 

 Patt.ID Upper Workers Retired Other No.Unobs Patt.Perc
       1     -       -       -     -        0     95.17
       2     -       -       -     +        1      0.81
       3     -       -       +     -        1      0.28
       4     -       -       +     +        2      0.03
       5     -       +       -     -        1      0.58
       6     -       +       -     +        2      0.13
       7     -       +       +     -        2      0.02
       8     -       +       +     +        3      0.01
       9     +       -       -     -        1      2.25
      10     +       -       -     +        2      0.24
      11     +       -       +     -        2      0.09
      12     +       -       +     +        3      0.02
      13     +       +       -     -        2      0.23
      14     +       +       -     +        3      0.10
      15     +       +       +     -        3      0.02
      16     +       +       +     +        4      0.02

Percentage cells by component 
  Upper Workers Retired   Other 
   2.96    1.11    0.48    1.36 

Overall percentage cells: 1.48% 

Using the previous strategy, we add one individual to all PCs in each municipality with at least one zero component. Overall, this imputation affected 1681 municipalities, adding a total of 6724 fictive individuals.

Code
is_zero_row <- 0 != rowSums(PROFCAT == 0) 
PROFCAT_IMP <- PROFCAT + is_zero_row
cbind(
  sum(is_zero_row),
  sum(is_zero_row) * ncol(PROFCAT),
  sum(PROFCAT_IMP) - sum(PROFCAT))
     [,1] [,2] [,3]
[1,] 1681 6724 6724

2.3.2.1 Before and after imputation

We proceed to compare the distribution of the PC compositions before and after our imputation process. In the graphics below, the mean seems unaffected, while the variance grows more substantially than in the case of the VOTE composition.

Code
opar <- par(mar=c(0,4,0,4), mfrow = c(1,2))
simplex_isodensity(acomp(PROFCAT_IMP[,(1:3)]))
title(sub = "Imputed", outer = FALSE)
simplex_isodensity(acomp(PROFCAT[,(1:3)]))
title(sub = "Original")
par(opar)

Code
opar <- par(mar=c(0,3,0,3), mfrow = c(1,2))
simplex_isodensity(acomp(PROFCAT_IMP[,(2:4)]))
title(sub = "Imputed", outer = FALSE)
simplex_isodensity(acomp(PROFCAT[,(2:4)]))
title(sub = "Original")
par(opar)

3 Exploratory analysis

For the analysis, we proceed with the imputed VOTE and PC data.

Code
mun2explore <- data.table(
  ID_MUN = mun_elec2census$ID_MUN,
  REGISTERED = rowSums(VOTE_IMP))
mun2explore <- cbind(mun2explore, VOTE_IMP, PROFCAT_IMP)

3.1 Range of Values

Code
logit <- function(x) log(x/(1-x))
ggplot(mun2explore[REGISTERED < 300000,],) +
  geom_point(aes(y = log(1 - `No vote`/REGISTERED), x = logit(1 - `No vote`/REGISTERED))) +
  theme_bw() +
  theme()

3.2 Heteroskedasticity

Since election data corresponds to aggregations of individual choices, we should suspect heteroskedasticity. The scatter plot below shows that the number of voters directly impacts the turnout rate. Additionally, its volatility decreases with the size of the municipality in terms of the number of voters.

Code
ggplot(mun2explore[REGISTERED < 300000,], aes(x = sqrt(REGISTERED), y = 1 - `No vote`/REGISTERED)) +
  geom_point() +
  geom_smooth() +
  theme_bw() +
  theme()

The graphic below reveals the same pattern for all components of the VOTE data.

Code
seq125 <- c(0, unlist(lapply(10^seq(10), "*", c(1,2,5))))
seqSqr <- seq(0,15)^2 * 1000 
namesSqr <- c(
  sprintf("..., %s]", seqSqr[-1]), 
  sprintf("(%s, ...", seqSqr[length(seqSqr)]))
mun2explore[,REGISTERED_BINS := cut(REGISTERED,c(seqSqr,Inf),labels = namesSqr)]
mun2explore_long <- 
  melt(mun2explore,
       id.vars = c("ID_MUN","REGISTERED_BINS"),
       measure.vars = colnames(VOTE))

mun2explore_long[,value_rel := value/sum(value), by = "ID_MUN"]
mun2explore_long[,variable:=factor(variable, levels = rename_elec)]
ggplot(mun2explore_long, aes(x = REGISTERED_BINS, y = value_rel)) +
  geom_boxplot(outlier.size = .1) +
  facet_wrap("variable", ncol = 3) +
  scale_y_continuous(name = "Vote share",labels = scales::percent) +
  labs(x = "Number of registered voters") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90,hjust = 1,vjust = .5)) 

ggsave(here("out/figures", "vote6_boxplots.pdf"), width = 7, height = 6)

For the professional categories, we observe again the same pattern.

Code
mun2explore_long2 <- 
  melt(mun2explore,
       id.vars = c("ID_MUN","REGISTERED_BINS"),
       measure.vars = colnames(PROFCAT))

mun2explore_long2[,value_rel := value/sum(value), by = "ID_MUN"]
ggplot(mun2explore_long2, aes(x = REGISTERED_BINS, y = value_rel)) +
  geom_boxplot(outlier.size = .1) +
  facet_wrap("variable", ncol = 2) +
  scale_y_continuous(name = "Vote share",labels = scales::percent) +
  labs(x = "Number of registered voters") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90,hjust = 1,vjust = .5))

ggsave(here("out/figures", "profcat4_boxplots.pdf"), width = 7, height = 6)

Code
# After the imputations, we decide on the data for modeling
mun2model <- data.frame(
  ID_MUN = mun_elec2census$ID_MUN,
  REGISTERED = rowSums(as.matrix(VOTE_IMP)),
  REGISTERED_G = Reduce("*", lapply(VOTE_IMP, as.numeric)),
  POP_DENSITY = mun_elec2census$POPULATION / mun_elec2census$AREA)
mun2model[["VOTE"]] <- as.matrix(VOTE_IMP)/mun2model$REGISTERED
mun2model[["PROFCAT"]] <- as.matrix(PROFCAT_IMP)/rowSums(PROFCAT_IMP)

4 Understanding “linear” changes in compositional variables

In this section, we illustrate how compositional variables change based on the PC composition in Paris. Below, we have the observed composition for the Paris municipality.

Code
paris_id <- "75056"
paris <- mun2model[mun2model$ID_MUN == paris_id,,drop = FALSE]
paris$PROFCAT
         Upper   Workers   Retired     Other
[1,] 0.4421546 0.1619779 0.1843559 0.2115117

In the following subsections, take the composition of Paris as a point of departure and add a sequence of linear increments, where linearity refers to the simplex geometry. It should be evident that a linear increment of any multivariate vector is defined by a direction and a step size \(h=0.1\). Since we normalized the direction to unit length \(h\) reflects the Aitchison distance between two sequential points.

4.1 Changing in direction of the first vertex

Code
paris_pcseq1 <- CoDa_path(
  comp_direc = c(2,1,1,1),
  comp_from = paris$PROFCAT,
  add_opposite = TRUE,
  step_size = .1) # take the same step size as before

We consider the direction that points to the first vertex of the simplex, in our example, the share of “Upper”. The two ternary diagrams below reveal how the PC composition changes along the path defined by this direction. The initial composition observed for Paris is shown as a red cross. One particularity of this direction is that the subcomposition excluding “Other” remains constant along the path.

Code
opar <- par(mfrow = c(1,2),mai = c(0,.6,0,.6), oma = c(0,0,0,0), mar = c(2,3,0,3))
plot.acomp(paris_pcseq1[,-4])
plot.acomp(paris_pcseq1["0",-4, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)

plot.acomp(paris_pcseq1[,-1])
plot.acomp(paris_pcseq1["0",-1, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)

save_baseplot(pdf, here("out/figures/","Xe_change_ternary.pdf"), width = 7, height = 3)
par(opar)

Both graphics above make clear that the ratios between all components except “Upper” remain constant. This also explains why the path in the left ternary plot appears linear in the Euclidean sense. Below, we represent the same path as a stacked bar plot, where the ratio of bar heights remains constant except when “Upper” is involved.

Code
names(paris_pcseq1) <- colnames(PROFCAT)
window <- as.character(seq(-40,40,by = 2))
barplot(acomp(paris_pcseq1[window,,drop=FALSE]), xlab = "value of h")
save_baseplot(pdf, here("out/figures","Xe_change_barplot.pdf"))

Another way to look at the changing composition is to present the changes to all components as a function of the share of “Upper”. In this graphic, we additionally display the empirical distribution of the share of “Upper” over all municipalities. The vertical line intersects the other in the observed composition for Paris, showing that Paris is an extreme observation in the “Upper” dimension.

Code
paris_pcseq1_gg <- data.frame(paris_pcseq1)
colnames(paris_pcseq1_gg) <- paste0("PC", 1:4)
ggplot(paris_pcseq1_gg) +
  geom_vline(xintercept = paris$PROFCAT[[1]]) +
  geom_line(aes(x = PC1, y = PC1, col = "1", lty = "1"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC2, col = "2", lty = "2"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC3, col = "3", lty = "3"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC4, col = "4", lty = "4"), lwd = .51) +
  geom_hline(yintercept = 1, col = "black") +
  geom_boxplot(
    data = data.frame(PC1 = mun2model$PROFCAT[,1]),
    mapping = aes(x = PC1, y = 1 + .0625),
    varwidth = TRUE,
    size = .5,
    col = "red",
    alpha = .25,
    width = .05) +
  scale_x_continuous(
    labels = scales::percent,
    sec.axis = dup_axis(name= "Boxplot of % in professional category: Upper")) +
  scale_y_continuous(
    labels = scales::percent,breaks = seq(0,1,.25)) +
  scale_color_manual(
    breaks = c("1", "2", "3", "4"),
    values = c("Red", "Orange", "Dark Blue", "Yellow"),
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  scale_linetype_manual(
    breaks = c("1", "2", "3", "4"),
    values = 1:4,
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  labs(x = "% in professional category: Upper",
       y = "% in professional category") +
  coord_equal() +
  theme_bw()

ggsave(here("out/figures/Xe_change_scatter_with_details.pdf"), height = 6, width = 6)

4.2 Changing in a general direction

We repeat the same linear increment procedure for a more general direction. Here, we chose a path for which the share of “Upper” passes through the whole interval \((0,1)\), because we want to show graphics comparable to the previous ones.

Code
dir_gen <- c(.1,.5,.25,.15)
paris_pcseq2 <- CoDa_path(
  comp_direc = dir_gen,
  comp_from = paris$PROFCAT,
  add_opposite = TRUE,
  step_size = attr(paris_pcseq1,"step_size")) # take the same step size as before

An alternative direction is the one that points from the observed point for Paris to the origin of the simplex. The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.

Code
opar <- par(mfrow = c(1,2),mai = c(0,.6,0,.6), oma = c(0,0,0,0), mar = c(2,3,0,3))
plot.acomp(paris_pcseq2[,-4])
plot.acomp(paris_pcseq2["0",-4, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)

plot.acomp(paris_pcseq2[,-1])
plot.acomp(paris_pcseq2["0",-1, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)

save_baseplot(pdf, here("out/figures/","Xu_change_ternary.pdf"), width = 7, height = 3, bg = "transparent")
par(opar)

The simplex-linear path this direction generates looks more complex than the previous one because it is non-linear in the Euclidean sense. For this path, the ratios of the other components are no longer constant, and the changes are generally not monotonic. The lack of monotonicity is evident from the shares of “Other” and “Retired” that peak close to the center of our variation and decline towards both edges.

Code
names(paris_pcseq2) <- colnames(PROFCAT)
window <- as.character(seq(-80,80,by = 4))
barplot(acomp(paris_pcseq2[window,,drop=FALSE]), xlab = "value of h")

Code
save_baseplot(pdf, here("out/figures/","Xu_change_barplot.pdf"))

The intersection with the black line is again the empirical composition of Paris.

Code
paris_pcseq2_gg <- data.frame(paris_pcseq2)
colnames(paris_pcseq2_gg) <- paste0("PC", 1:4)
ggplot(paris_pcseq2_gg) +
  geom_vline(xintercept = paris$PROFCAT[[1]]) +
  geom_line(aes(x = PC1, y = PC1, col = "1", lty = "1"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC2, col = "2", lty = "2"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC3, col = "3", lty = "3"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC4, col = "4", lty = "4"), lwd = .51) +
  geom_hline(yintercept = 1, col = "black") +
  geom_boxplot(
    data = data.frame(PC1 = mun2model$PROFCAT[,1]),
    mapping = aes(x = PC1, y = 1 + .0625),
    varwidth = TRUE,
    size = .5,
    col = "red",
    alpha = .25,
    width = .05) +
  scale_x_continuous(
    labels = scales::percent,
    sec.axis = dup_axis(name= "Boxplot of % in professional category: Upper")) +
  scale_y_continuous(
    labels = scales::percent,breaks = seq(0,1,.25)) +
  scale_color_manual(
    breaks = c("1", "2", "3", "4"),
    values = c("Red", "Orange", "Dark Blue", "Yellow"),
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  scale_linetype_manual(
    breaks = c("1", "2", "3", "4"),
    values = 1:4,
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  labs(x = "% in professional category: Upper",
       y = "% in professional category") +
  coord_equal() +
  theme_bw()

ggsave(here("out/figures/Xu_change_scatter_with_details.pdf"), height = 6, width = 6)

4.3 Changing in direction of the origin

If the direction and the starting point coincide, the path will pass through the origin of the simplex.

Code
paris_pcseq3 <- CoDa_path(
  comp_direc = paris$PROFCAT,
  comp_from = paris$PROFCAT,
  add_opposite = TRUE,
  step_size = attr(paris_pcseq1,"step_size")) # take the same step size as before

The two ternary diagrams below show a path generated along this direction, where the red cross is the empirical composition for Paris and the blue star is the origin.

Code
opar <- par(mfrow = c(1,2),mai = c(0,.6,0,.6), oma = c(0,0,0,0), mar = c(2,3,0,3))
plot.acomp(paris_pcseq3[,-4])
plot.acomp(paris_pcseq3["0",-4, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)
plot.acomp(c(1,1,1), col = "blue", add = TRUE, pch = 8, cex = 2)

plot.acomp(paris_pcseq3[,-1])
plot.acomp(paris_pcseq3["0",-1, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)
plot.acomp(c(1,1,1), col = "blue", add = TRUE, pch = 8, cex = 2)
par(opar)

The simplex-linear path generated by this direction is again non-linear in the Euclidean sense and also leads to non-monotonic changes in the shares of “Other” and “Retired”.

Code
names(paris_pcseq3) <- colnames(PROFCAT)
window <- as.character(seq(-40,40,by = 2))
barplot(acomp(paris_pcseq3[window,,drop=FALSE]), xlab = "value of h")

When looking at the functional representation of this linear path, we see that the shares for each professional category cross at the origin of the 4-dimensional simplex. The intersection with the black line is again the empirical composition of Paris.

Code
paris_pcseq3_gg <- data.frame(paris_pcseq3)
colnames(paris_pcseq3_gg) <- paste0("PC", 1:4)
ggplot(paris_pcseq3_gg) +
  geom_vline(xintercept = paris$PROFCAT[[1]]) +
  geom_vline(xintercept = 1/4, col = "grey55") +
  geom_line(aes(x = PC1, y = PC1, col = "1", lty = "1"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC2, col = "2", lty = "2"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC3, col = "3", lty = "3"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC4, col = "4", lty = "4"), lwd = .51) +
  geom_hline(yintercept = 1, col = "black") +
  geom_boxplot(
    data = data.frame(PC1 = mun2model$PROFCAT[,1]),
    mapping = aes(x = PC1, y = 1 + .0625),
    varwidth = TRUE,
    size = .5,
    col = "red",
    alpha = .25,
    width = .05) +
  scale_x_continuous(
    labels = scales::percent,
    sec.axis = dup_axis(name= "Boxplot of % in professional category: Upper")) +
  scale_y_continuous(
    labels = scales::percent,breaks = seq(0,1,.25)) +
  scale_color_manual(
    breaks = c("1", "2", "3", "4"),
    values = c("Red", "Orange", "Dark Blue", "Yellow"),
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  scale_linetype_manual(
    breaks = c("1", "2", "3", "4"),
    values = 1:4,
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  labs(x = "% in professional category: Upper",
       y = "% in professional category") +
  coord_equal() +
  theme_bw()

4.4 Changing in the relative compensation direction

Another interesting direction compensates for the relative increase in one component by decreasing another component. Below, we illustrate this movement for the pair “Upper” and “Workers”, where we see that the share of “Upper” moves towards its summit when \(h\) increases and the share of “Worker moves away from the summit.

Code
paris_pcseq4 <- CoDa_path(
  comp_direc = exp(c(1,-1,0,0)),
  comp_from = paris$PROFCAT,
  add_opposite = FALSE, # only draw the line for one side
  step_size = .1) 

opar <- par(mfrow = c(1,3),mai = c(0,.6,0,.6), oma = c(0,0,0,0), mar = c(2,3,0,3))
plot.acomp(paris_pcseq4[,-4])
plot.acomp(paris_pcseq4["0",-4, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)
plot.acomp(c(1,1,1), col = "blue", add = TRUE, pch = 8, cex = 2)

plot.acomp(paris_pcseq4[,-1])
plot.acomp(paris_pcseq4["0",-1, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)
plot.acomp(c(1,1,1), col = "blue", add = TRUE, pch = 8, cex = 2)

plot.acomp(paris_pcseq4[,-2])
plot.acomp(paris_pcseq4["0",-2, drop = FALSE], col = "red", add = TRUE, pch = 3, cex = 2)
plot.acomp(c(1,1,1), col = "blue", add = TRUE, pch = 8, cex = 2)
par(opar)

One interesting aspect of this path is that it leads to an axial symmetry in the shares of the components “Other” and “Retired”.

Code
paris_pcseq4 <- CoDa_path(
  comp_direc = exp(c(1,-1,0,0)),
  comp_from = paris$PROFCAT,
  add_opposite = TRUE,
  step_size = .1) 

names(paris_pcseq4) <- colnames(PROFCAT)
window <- as.character(seq(-70,70,by = 1))
barplot(acomp(paris_pcseq4[window,,drop=FALSE]), xlab = "value of h")

The shares of the components “Other” and “Retired” are maximized at the point where those of “Upper” and “Workers” become equal.

Code
equal_point <- which.min(abs(paris_pcseq4$Workers - paris_pcseq4$Upper))
focus_seq <- seq(-60,60)
plot(x = focus_seq,y = paris_pcseq4$Other[focus_seq + equal_point], type = "l", col = "blue", xlab = NA, ylab = NA)
lines(x = focus_seq,y = paris_pcseq4$Retired[focus_seq + equal_point], col = "red")
title(xlab = 'steps around the point of equality in "Upper" and "Workers"',
      ylab = 'shares of "Other" and "Retired"')

Another interesting point is that in clr space, the components “Other” and “Retired” are unaffected along this path.

Code
HT_dir_comp <- clr(rbind(head(paris_pcseq4, 2), paris_pcseq4["0",], tail(paris_pcseq4, 2)))
rownames(HT_dir_comp) <- paste0("compensation direction: ", rownames(HT_dir_comp))
HT_dir_vert <- clr(rbind(head(paris_pcseq1, 2), paris_pcseq4["0",], tail(paris_pcseq1, 2)))
rownames(HT_dir_vert) <- paste0("vertex direction: ", rownames(HT_dir_vert))
rbind(HT_dir_comp, HT_dir_vert)
                                  Upper    Workers    Retired       Other
compensation direction: -100 -6.4169752  6.7209609 -0.2206989 -0.08328681
compensation direction: -99  -6.3462645  6.6502502 -0.2206989 -0.08328681
compensation direction: 0     0.6540926 -0.3501069 -0.2206989 -0.08328681
compensation direction: 99    7.6544498 -7.3504640 -0.2206989 -0.08328681
compensation direction: 100   7.7251604 -7.4211747 -0.2206989 -0.08328681
vertex direction: -100       -8.0061614  2.5366444  2.6660524  2.80346454
vertex direction: -99        -7.9195589  2.5077769  2.6371849  2.77459702
vertex direction: 0           0.6540926 -0.3501069 -0.2206989 -0.08328681
vertex direction: 99          9.2277441 -3.2079907 -3.0785828 -2.94117064
vertex direction: 100         9.3143467 -3.2368582 -3.1074503 -2.97003815

The functional representation of this linear is shown below.

Code
paris_pcseq4_gg <- data.frame(paris_pcseq4)
colnames(paris_pcseq4_gg) <- paste0("PC", 1:4)
ggplot(paris_pcseq4_gg) +
  geom_vline(xintercept = paris$PROFCAT[[1]]) +
  geom_vline(xintercept = 1/4, col = "grey55") +
  geom_line(aes(x = PC1, y = PC1, col = "1", lty = "1"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC2, col = "2", lty = "2"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC3, col = "3", lty = "3"), lwd = .51) +
  geom_line(aes(x = PC1, y = PC4, col = "4", lty = "4"), lwd = .51) +
  geom_hline(yintercept = 1, col = "black") +
  geom_boxplot(
    data = data.frame(PC1 = mun2model$PROFCAT[,1]),
    mapping = aes(x = PC1, y = 1 + .0625),
    varwidth = TRUE,
    size = .5,
    col = "red",
    alpha = .25,
    width = .05) +
  scale_x_continuous(
    labels = scales::percent,
    sec.axis = dup_axis(name= "Boxplot of % in professional category: Upper")) +
  scale_y_continuous(
    labels = scales::percent,breaks = seq(0,1,.25)) +
  scale_color_manual(
    breaks = c("1", "2", "3", "4"),
    values = c("Red", "Orange", "Dark Blue", "Yellow"),
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  scale_linetype_manual(
    breaks = c("1", "2", "3", "4"),
    values = 1:4,
    name = "professional\ncategory",
    labels = colnames(PROFCAT)) +
  labs(x = "% in professional category: Upper",
       y = "% in professional category") +
  coord_equal() +
  theme_bw()

5 Y-scalar model interpretations

In this section, we look at a model that explains the turnout rate, which we treat as a real variable.

Code
mun2model$TURNOUT <- 1 - mun2model$VOTE[,6]
TURNOUT_model <- lmCoDa(
  TURNOUT ~  log(POP_DENSITY) + ilr(PROFCAT) + 1,
  data = mun2model,
  weights = REGISTERED)

summary(TURNOUT_model)

Call:
lmCoDa(formula = TURNOUT ~ log(POP_DENSITY) + ilr(PROFCAT) + 
    1, data = mun2model, weights = REGISTERED)

Weighted Residuals:
    Min      1Q  Median      3Q     Max 
-28.908  -0.467   0.194   0.833  39.456 

Coefficients:
                   Estimate Std. Error  t value     Pr(>|t|)    
(Intercept)       0.8520793  0.0009914  859.464      < 2e-16 ***
log(POP_DENSITY) -0.0163531  0.0001375 -118.914      < 2e-16 ***
ilr(PROFCAT)1    -0.0754553  0.0005802 -130.053      < 2e-16 ***
ilr(PROFCAT)2    -0.0036018  0.0006503   -5.539 0.0000000307 ***
ilr(PROFCAT)3    -0.0288969  0.0007904  -36.559      < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.355 on 34815 degrees of freedom
Multiple R-squared:  0.4746,    Adjusted R-squared:  0.4745 
F-statistic:  7862 on 4 and 34815 DF,  p-value: < 2.2e-16

The analysis of variance is given by

Code
aov <- anova(TURNOUT_model)
aov$`Sum Sq` <- round(aov$`Sum Sq`, 2)
aov$`Mean Sq` <- round(aov$`Mean Sq`, 2)
aov$`F value` <- round(aov$`F value`, 2)

kable(aov, caption = "Analysis of variance table for the X-compositional model") |> kable_classic()
Analysis of variance table for the X-compositional model
Df Sum Sq Mean Sq F value Pr(>F)
log(POP_DENSITY) 1 14328.87 14328.87 7799.13 0
ilr(PROFCAT) 3 43446.42 14482.14 7882.55 0
Residuals 34815 63963.50 1.84
Code
kable(aov, caption = "Analysis of variance table for the X-compositional model", label = "aov_TURNOUT",format = "latex", booktabs = TRUE) |> 
  kable_styling(latex_options = "hold_position") |> 
  save_kable(file = here("out/tables/aov_TURNOUT.tex"))

5.1 Interpreting the impact of compositional X

5.1.1 Semi-elasticities

In this model, the clr transformed coefficients are equal to the semi-elasticity of the shares.

Code
clr_confint <- confint(TURNOUT_model, "PROFCAT")

ggplot(clr_confint) +
  theme_bw() +
  coord_flip() +
  geom_hline(yintercept = 0) +
  geom_segment(aes(x=X, xend=X, y=0, yend=EST, col = X)) +
  geom_point(aes(x=X, y = EST, col = X)) +
  geom_point(aes(x=X, y = `2.5 %`, col = X), pch = "[", stroke = 3) +
  geom_point(aes(x=X, y = `97.5 %`, col = X), pch = "]", stroke = 3) +
  labs(y = "semi-elasticities values", x = "professional category") +
  theme(legend.position = "none")

ggsave(here("out/figures", "yscalar_elasticity_profcat.pdf"), height = 2, width = 7)

5.1.2 Semi-elasticities differences and simplex coeffiecients

We may nonetheless compute the differences in semi-elasticities and also bring the coefficients back into the simplex.

Code
clr_confint <- confint(TURNOUT_model, "PROFCAT")


# to compute the interval of differences we require 
# the clr covariance matrix
varcov_Y <- TURNOUT_model$trSry$VARCOV_CLR$"TURNOUT"
varcov_X <- TURNOUT_model$trSry$VARCOV_CLR$"ilr(PROFCAT)"
varcov_b <- varcov_X * as.vector(varcov_Y)
clr_b    <- coef(TURNOUT_model, space = "clr", split = TRUE)[["ilr(PROFCAT)"]]

varcov_diff <- clr_diff <- rep(0, nrow(varcov_b))
r <- 1 # reference
for (i in seq_along(varcov_diff)) {
  clr_diff[i]    <- clr_b[i] - clr_b[r]
  varcov_diff[i] <- varcov_b[r,r] + varcov_b[i,i] - 2 * varcov_b[i,r]
}

# given the sample size we can use normal approximations for to obtain confidence intervals
diff_confint <- clr_confint
diff_confint$EST <- clr_diff
diff_confint$SD <- sqrt(varcov_diff)
diff_confint[["2.5 %"]]  <- clr_diff + qnorm(0.025) * diff_confint$SD 
diff_confint[["97.5 %"]] <- clr_diff + qnorm(0.975) * diff_confint$SD # 

# we do not have confidence intervals for simplex valued parameters
simplex_confint <- clr_confint
simplex_confint$EST <- exp(clr_b)/sum(exp(clr_b))
simplex_confint[["SD"]] <- simplex_confint[["2.5 %"]]  <- simplex_confint[["97.5 %"]] <- NA


cWay <- "clr / semi-elasticity"
dWay <- "difference in clr / se"
sWay <- "simplex"
threeWays4params <- rbind(
  cbind(Way = cWay , cen = 0  , clr_confint),
  cbind(Way = dWay , cen = 0  , diff_confint),
  cbind(Way = sWay, cen = .25, simplex_confint))
threeWays4params$X  <- factor(threeWays4params$X, levels = rev(rownames(clr_b)))


tail2 <- nrow(threeWays4params) + c(1,2)
threeWays4params <- rbind(threeWays4params, NA)
threeWays4params <- rbind(threeWays4params, NA)
threeWays4params$EST[tail2] <- c(.35,.15)
threeWays4params$Way[tail2] <- sWay
threeWays4params$X2 <- threeWays4params$X
threeWays4params$X2[tail2] <- NA
threeWays4params$X[tail2] <- "Upper"

ggplot(threeWays4params) +
  theme_bw() +
  coord_flip() +
  geom_hline(aes(yintercept = cen), col = "grey50", lty = "dotted") +
  geom_segment(aes(x=X, xend=X, y=cen, yend=EST, col = X2)) +
  geom_point(aes(x=X, y = EST, col = X2)) +
  geom_point(aes(x=X, y = `2.5 %`, col = X2), pch = "[", stroke = 3) +
  geom_point(aes(x=X, y = `97.5 %`, col = X2), pch = "]", stroke = 3) +
  scale_color_discrete_qualitative(na.translate = FALSE) +
  facet_wrap("Way", scales = "free_x") +
  labs(y = "value", x = "professional\ncategory") +
  theme(legend.position = "none")

Code
ggsave(here("out/figures", "yscalar_1clr2diff3simplex_profcat.pdf"), height = 1.65, width = 7)

5.1.3 Variation Table

The table below illustrates how the dependent variable TURNOUT would react to a change in the composition of the professional categories, where we focus on the municipality of Paris.

Code
format_VarTab <- function(VT, digits = 4, digitsU = max(digits - 2, 0), digitsP = digits,  digitsPP = digits) {
  # Print variation tables in a nicer way
  VT <- t(VT)
  VT <- as.data.frame(round(VT, digits))
  
  v <- "Variation in units"
  if (!is.null( VT[[v]])) VT[[v]] <- round(VT[[v]], digitsU)
  
  v <- "Variation in %"
  if (!is.null( VT[[v]])) VT[[v]] <- paste0(round(VT[[v]], digitsP), " %")
  
  v <- "Variation in % points"
  if (!is.null( VT[[v]])) VT[[v]] <- paste0(round(VT[[v]], digitsPP), " %")
  t(VT)
}
Code
paris_num <- which(mun2model$ID_MUN == paris_id)
VariationTable4PCparis <- function(Xdir) VariationTable(
  object = TURNOUT_model,
  Xvar = "PROFCAT",
  Xdir = Xdir,
  inc_size = .1,
  obs = paris_num,
  normalize_Xdir = FALSE)
  


paris_varTab <- cbind(
  VariationTable4PCparis("Upper"),
  VariationTable4PCparis(dir_gen))
row.names(paris_varTab)[4] <- "Variation in % points"
paris_varTab[4,] <- paris_varTab[5,] * 100
paris_varTab <- paris_varTab[-5,]

tab_atr <- attributes(VariationTable4PCparis("Upper"))
colnames(paris_varTab) <- c("Direction to vertex", "General direction")
format_VarTab(paris_varTab, 3,digitsP = 3, digitsPP =  3) |> 
  kable(caption = "Variation of the turnout when changing  PC in different directions", digits = 4, booktabs = TRUE) |> 
  add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2))) 
Variation of the turnout when changing PC in different directions
Direction to vertex General direction
Initial value 0.747 0.747
New value 0.752 0.741
Semi elasticity 0.063 -0.075
Variation in % points 0.472 % -0.562 %

Note: ah = 0.1, alpha=5.58% bh = 0.1, alpha=-4.42% ch = 0.1, alpha=-4.42% dh = 0.1, alpha=-4.42%

5.1.4 Variation scenario

We now evaluate, for Paris, how changes in the PC composition would impact the vote shares of each candidate. In the following, we consider scenarios of changes defined by the linear paths illustrated in the previous section. The graphic below shows that all scenarios pass through the observed PC composition for Paris and lead to a predicted turnout of about \(72.9\%\). In the hypothetical scenario where Paris has shares in each professional category, the turnout is predicted to drop to about \(67\%\).

Code
# first scenario data
paris_pred1 <- paris[rep(1,nrow(paris_pcseq1)),]
paris_pred1$PROFCAT <- as.matrix(paris_pcseq1)
paris_pred1$VOTE <- NULL
paris_pred1$PRED_TURNOUT <- predict(TURNOUT_model, newdata = paris_pred1) 
paris_pred1$H_SEQ <- row.names(paris_pred1) <- row.names(paris_pcseq1)
paris_pred1$F_SHARE <- paris_pred1$PROFCAT[,"Upper"]

paris_pred1_gg <- melt(
  data = data.table(paris_pred1),
  id.vars = c("H_SEQ","F_SHARE","PRED_TURNOUT"),
  measure.vars = paste0("PROFCAT.", colnames(PROFCAT)))

# second scenario data
paris_pred2 <- paris[rep(1,nrow(paris_pcseq2)),]
paris_pred2$PROFCAT <- as.matrix(paris_pcseq2)
paris_pred2$VOTE <- NULL
paris_pred2$PRED_TURNOUT <- predict(TURNOUT_model, newdata = paris_pred2) 
paris_pred2$H_SEQ <- row.names(paris_pred2) <- c(row.names(paris_pcseq2))
paris_pred2$F_SHARE <- paris_pred2$PROFCAT[,"Upper"]

paris_pred2_gg <- melt(
  data = data.table(paris_pred2),
  id.vars = c("H_SEQ","F_SHARE","PRED_TURNOUT"),
  measure.vars = paste0("PROFCAT.", colnames(PROFCAT)))

# third scenario data
paris_pred3 <- paris[rep(1,nrow(paris_pcseq3) + 1),]
paris_pred3$PROFCAT <- rbind(as.matrix(paris_pcseq3),rep(.25,4)) # add the origin
paris_pred3$VOTE <- NULL
paris_pred3$PRED_TURNOUT <- predict(TURNOUT_model, newdata = paris_pred3) 
paris_pred3$H_SEQ <- row.names(paris_pred3) <- c(row.names(paris_pcseq3),"101")
paris_pred3$F_SHARE <- paris_pred3$PROFCAT[,"Upper"]

paris_pred3_gg <- melt(
  data = data.table(paris_pred3),
  id.vars = c("H_SEQ","F_SHARE", "PRED_TURNOUT"),
  measure.vars = paste0("PROFCAT.", colnames(PROFCAT)))


paris_pred_gg <- rbind(
  cbind("DIR" = "Direction to vertex", paris_pred1_gg),
  cbind("DIR" = "General direction", paris_pred2_gg),
  cbind("DIR" = "Origin direction", paris_pred3_gg))
paris_pred_gg[,variable := str_remove(variable,"PROFCAT.")]
paris_pred_gg[,variable := factor(variable,colnames(PROFCAT))]

hline <- data.frame(
  INT = c(paris_pred1[c("0", "0", "0"), "PRED_TURNOUT"],paris_pred3["101", "PRED_TURNOUT"]),
  DIR = c("Direction to vertex", "General direction",  "Origin direction",  "Origin direction"),
  LAB = c("(Paris)", "(Paris)", "(Paris)", "(origin)"),
  YYY = c(.85),
  JJJ = c(-.5,-.5,-.5, 1.5))

ggplot(paris_pred_gg) +
  geom_vline(aes(xintercept = INT), data = hline) +
  geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data =  hline) +
  geom_line(aes(y = value, x = PRED_TURNOUT, col =variable, lty = variable)) +
  scale_y_continuous(breaks = seq(0,10)/10,labels = scales::percent) +
  scale_x_continuous(labels = scales::percent,sec.axis = dup_axis(name = NULL), limits = c(.5,.9)) +
  coord_flip() +
  theme_bw() +
  labs(y = "share in professional category", x = "predicted turnout",
       col = "professional\ncategory", lty = "professional\ncategory") +
  facet_wrap("DIR", ncol = 1, scales = "free_x")

For the first two scenarios, the same graph becomes:

Code
ggplot(paris_pred_gg[DIR != "Origin direction",]) +
  geom_vline(aes(xintercept = INT), data = hline[1:2,]) +
  geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data =  hline[1:2,]) +
  geom_line(aes(y = value, x = PRED_TURNOUT, col =variable, lty = variable)) +
  scale_y_continuous(breaks = seq(0,10)/10,labels = scales::percent) +
  scale_x_continuous(labels = scales::percent,sec.axis = dup_axis(name = NULL), limits = c(.5,.9)) +
  theme_bw() +
  coord_flip() +
  labs(y = "share in professional category", x = "predicted turnout",
       col = "professional\ncategory", lty = "professional\ncategory") +
  facet_wrap("DIR", ncol = 1, scales = "free_x")

An alternative and maybe more straightforward way to show the same information is to look simultaneously at the changes of the dependent and independent variables as a function of \(h\).

Code
paris_pred_gg2 <- copy(paris_pred_gg[DIR != "Origin direction",])
paris_pred_gg2[, COMPO := "Change in X"]

paris_pred_gg3 <- copy(paris_pred_gg2[variable == "Upper",])
paris_pred_gg3[, COMPO := "Change in Y"]
paris_pred_gg3[, variable := NA]
paris_pred_gg3[,value := PRED_TURNOUT]

paris_pred_ggX <- rbind(paris_pred_gg2, paris_pred_gg3)
paris_pred_ggX[, COMPO := factor(COMPO, c("Change in Y", "Change in X"))]
paris_pred_ggX[,H_SEQ := as.integer(H_SEQ)]
rm(paris_pred_gg2, paris_pred_gg3)

ggplot(paris_pred_ggX[-30 <= H_SEQ & H_SEQ <= 30]) +
  geom_vline(xintercept = 0) +
  # geom_text(aes(x = INT, y = YYY, label = LAB, vjust = JJJ), data =  hline[1:2,]) +
  geom_line(aes(y = value, x = H_SEQ/10, col =variable, lty = variable)) +
  scale_y_continuous(breaks = seq(0,10)/10,labels = scales::percent) +
  scale_colour_manual(values = scales::hue_pal()(4), na.value = "#000000", breaks = levels(paris_pred_ggX$variable))  + 
  scale_linetype_manual(values = 1:4, na.value = 1, , breaks = levels(paris_pred_ggX$variable))  + 
  # scale_x_continuous(labels = scales::percent,sec.axis = dup_axis(name = NULL), limits = c(.5,.9)) +
  theme_bw() +
  labs(y = "shares of\nprofessional categories (bottom) and predicted turnout (top)",
       x = "value of h",
       col = "professional\ncategory", lty = "professional\ncategory") +
  facet_grid(COMPO ~ DIR, space = "free_y", scales = "free")

Code
ggsave(here("out/figures/x_compo_variation_scatter.pdf"), height = 4, width = 7)

6 Y-compositional model interpretations

In this section, we look at the case of a Y-compositional model that explains the entire composition of votes.

Code
VOTE_model <- lmCoDa(
  ilr(VOTE) ~  log(POP_DENSITY) + ilr(PROFCAT),
  data = mun2model)
summary(VOTE_model)
Response ilr(VOTE)1 :

Call:
lmCoDa(formula = `ilr(VOTE)1` ~ log(POP_DENSITY) + ilr(PROFCAT), 
    data = mun2model)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.95546 -0.21157  0.00821  0.22060  2.38820 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.213028   0.006649  32.040   <2e-16 ***
log(POP_DENSITY) -0.043064   0.001518 -28.377   <2e-16 ***
ilr(PROFCAT)1     0.203781   0.003510  58.063   <2e-16 ***
ilr(PROFCAT)2    -0.115172   0.003773 -30.528   <2e-16 ***
ilr(PROFCAT)3    -0.009089   0.003933  -2.311   0.0208 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3521 on 34815 degrees of freedom
Multiple R-squared:  0.1209,    Adjusted R-squared:  0.1208 
F-statistic:  1197 on 4 and 34815 DF,  p-value: < 2.2e-16


Response ilr(VOTE)2 :

Call:
lmCoDa(formula = `ilr(VOTE)2` ~ log(POP_DENSITY) + ilr(PROFCAT), 
    data = mun2model)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.5772 -0.2540 -0.0054  0.2498  3.7606 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -0.528380   0.008809  -59.98   <2e-16 ***
log(POP_DENSITY)  0.034209   0.002010   17.02   <2e-16 ***
ilr(PROFCAT)1    -0.145239   0.004650  -31.24   <2e-16 ***
ilr(PROFCAT)2     0.074919   0.004998   14.99   <2e-16 ***
ilr(PROFCAT)3     0.062590   0.005211   12.01   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4665 on 34815 degrees of freedom
Multiple R-squared:  0.04082,   Adjusted R-squared:  0.04071 
F-statistic: 370.4 on 4 and 34815 DF,  p-value: < 2.2e-16


Response ilr(VOTE)3 :

Call:
lmCoDa(formula = `ilr(VOTE)3` ~ log(POP_DENSITY) + ilr(PROFCAT), 
    data = mun2model)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.29032 -0.18463  0.01998  0.21421  2.18447 

Coefficients:
                  Estimate Std. Error  t value Pr(>|t|)    
(Intercept)      -0.747150   0.007187 -103.956   <2e-16 ***
log(POP_DENSITY) -0.003843   0.001640   -2.343   0.0191 *  
ilr(PROFCAT)1    -0.120232   0.003794  -31.692   <2e-16 ***
ilr(PROFCAT)2     0.061290   0.004078   15.029   <2e-16 ***
ilr(PROFCAT)3    -0.007321   0.004251   -1.722   0.0851 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3806 on 34815 degrees of freedom
Multiple R-squared:  0.03076,   Adjusted R-squared:  0.03065 
F-statistic: 276.3 on 4 and 34815 DF,  p-value: < 2.2e-16


Response ilr(VOTE)4 :

Call:
lmCoDa(formula = `ilr(VOTE)4` ~ log(POP_DENSITY) + ilr(PROFCAT), 
    data = mun2model)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.40942 -0.20221 -0.00447  0.19254  2.09011 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       0.474646   0.006319  75.110   <2e-16 ***
log(POP_DENSITY) -0.098307   0.001442 -68.158   <2e-16 ***
ilr(PROFCAT)1    -0.032753   0.003336  -9.819   <2e-16 ***
ilr(PROFCAT)2     0.055440   0.003586  15.462   <2e-16 ***
ilr(PROFCAT)3     0.064590   0.003738  17.279   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3346 on 34815 degrees of freedom
Multiple R-squared:  0.1668,    Adjusted R-squared:  0.1667 
F-statistic:  1743 on 4 and 34815 DF,  p-value: < 2.2e-16


Response ilr(VOTE)5 :

Call:
lmCoDa(formula = `ilr(VOTE)5` ~ log(POP_DENSITY) + ilr(PROFCAT), 
    data = mun2model)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8703 -0.1740 -0.0021  0.1682  5.8459 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.242161   0.005825  41.574   <2e-16 ***
log(POP_DENSITY) 0.039056   0.001329  29.377   <2e-16 ***
ilr(PROFCAT)1    0.112449   0.003075  36.573   <2e-16 ***
ilr(PROFCAT)2    0.031237   0.003305   9.451   <2e-16 ***
ilr(PROFCAT)3    0.047434   0.003446  13.767   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3085 on 34815 degrees of freedom
Multiple R-squared:  0.0626,    Adjusted R-squared:  0.0625 
F-statistic: 581.3 on 4 and 34815 DF,  p-value: < 2.2e-16
Code
aov <- anova(VOTE_model)
aov$`approx F` <- round(aov$`approx F`)
aov$Pillai <- round(aov$Pillai, 3)

kable(aov, caption = "Analysis of variance table for the y-compositional model") |> kable_classic()
Analysis of variance table for the y-compositional model
Df Pillai approx F num Df den Df Pr(>F)
(Intercept) 1 0.847 38609 5 34811 0
log(POP_DENSITY) 1 0.195 1688 5 34811 0
ilr(PROFCAT) 3 0.184 455 15 104439 0
Residuals 34815
Code
kable(aov, caption = "Analysis of variance table for the y-compositional model", label = "aov_VOTE", format = "latex", booktabs = TRUE) |> 
  kable_styling(latex_options = "hold_position") |> 
  save_kable(file = here("out/tables/aov_VOTE.tex"))

6.1 Interpreting the impact of a scalar X

6.1.1 Semi-elasticities differences

The clr coefficients correspond to a difference from the mean elasticity.

Code
flevels <- c("-Mean semi elasticity", rev(rename_elec))
clr_confint <- confint(VOTE_model, "log(POP_DENSITY)")
clr_confint$Y <- factor(clr_confint$Y, flevels)
clr_confintB <- clr_confint[1,]
clr_confintB$Y <- factor("-Mean semi elasticity", levels = flevels)
clr_confintB[,3:6] <- NA
clr_T <- coef(VOTE_model, "clr")
ME <- as(fitted(VOTE_model, "simplex"),"matrix") %*% t(clr_T[-1,])
ME <- melt(data.table(ME), variable.name = "X", value.name = "EST")
ME[,Y := factor("-Mean semi elasticity", levels = flevels)]

ggplot(rbind(clr_confint,clr_confintB)) +
  theme_bw() +
  coord_flip() +
  geom_hline(yintercept = 0) +
  geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +
  geom_point(aes(x=Y, y = EST, col = Y)) +
  geom_point(aes(x=Y, y = `2.5 %`, col = Y), pch = "[", stroke = 3) +
  geom_point(aes(x=Y, y = `97.5 %`, col = Y), pch = "]", stroke = 3) +
  geom_violin(aes(y = EST, x = Y),data = ME[X=="log(POP_DENSITY)",]) +
  labs(y = "clr parameter values & confidence intervals\ndistribution of negative mean semi elasticity", x = NULL) +
  theme(legend.position = "none")

ggsave(here("out/figures", "clr_b_popdens_with_details.pdf"), height = 2.25, width = 7)

We can use them to construct differences between the elasticities of two components of the dependent shares with respect to the same explanatory variable.

Code
clr_confint <- confint(VOTE_model, "log(POP_DENSITY)",y_ref = 1)
clr_confint$Y <- factor(clr_confint$Y, flevels)
clr_confintB <- clr_confint[1,]
clr_confintB[,3:6] <- NA

ggplot(rbind(clr_confint,clr_confintB)) +
  theme_bw() +
  coord_flip() +
  geom_hline(yintercept = 0) +
  geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +
  geom_point(aes(x=Y, y = DIFF, col = Y)) +
  geom_point(aes(x=Y, y = `2.5 %`, col = Y), pch = "[", stroke = 3) +
  geom_point(aes(x=Y, y = `97.5 %`, col = Y), pch = "]", stroke = 3) +
  labs(y = "semi-elasticity differences & confidence intervals\n(in the denominator is Macron)", x = NULL) +
  theme(legend.position = "none")

ggsave(here("out/figures", "SE_diff_popdens_with_details.pdf"), height = 2, width = 7)

6.1.2 Variation table

We can use Taylor approximations to evaluate how the voting behavior reacts to a small change in the number of registered voters. Here, we use increment log(POP_DENSITY) by \(0.1\), which means that population density increases by about 10%.

Code
paris_num <- which(mun2model$ID_MUN == paris_id)
paris_varTab <- VariationTable(VOTE_model, "log(POP_DENSITY)",inc_size = .1, obs = paris_num, Ytotal = mun2model$REGISTERED[paris_num])

format_VarTab(paris_varTab, 4, 0, 2, 2) |> 
  kable(caption = "Increase of the population density by 10%", digits = 4, booktabs = TRUE)
Increase of the population density by 10%
Macron Le Pen Mélenchon Left Right No vote
Initial parts 0.2500 0.1582 0.1729 0.0828 0.0880 0.2481
New parts 0.2505 0.1575 0.1734 0.0828 0.0871 0.2487
Semi elasticity 0.0198 -0.0411 0.0313 -0.0011 -0.1077 0.0230
Variation in % 0.2 % -0.41 % 0.31 % -0.01 % -1.08 % 0.23 %
Variation in % points 0.05 % -0.06 % 0.05 % 0 % -0.09 % 0.06 %
Variation in units 678 -889 739 -13 -1297 781
Code
format_VarTab(paris_varTab, 4, 0, 2, 2) |> 
  kable(caption = "Increase of the population density by 10\\%", digits = 4, booktabs = TRUE,
        format = "latex", label = "VarTab_VOTE_logPDENS_change", linesep = "") |> 
  kable_styling(latex_options = "hold_position") |> 
  save_kable(here("out/tables/VarTab_VOTE_logPDENS_change.tex"))

It is easy to verify that the variations in % points and units compensate each other.

Code
t(t(round(rowSums(paris_varTab),10)))
                               [,1]
Initial parts          1.0000000000
New parts              1.0000000000
Semi elasticity       -0.0757855744
Variation in %        -0.7578557439
Variation in % points  0.0000000000
Variation in units    -0.0000000001

6.1.3 Variation scenarios

In this scenario, the population density changes on a grid that is regular in logs. Even extreme changes such as a tenfold population increase or reduction to one-tenth would lead to little variation. When the population of Paris shrinks to \(0.1\%\) of its original size, we see considerable differences.

Code
paris_size_scenario <- paris[rep(1,101),]
paris_size_scenario$POP_DENSITY <- paris_size_scenario$POP_DENSITY * exp(seq(-log(1000),log(10),length.out = 101))
paris_size_scenario$VOTE <- as(predict(VOTE_model,space = "simplex", newdata = paris_size_scenario), "matrix")
colnames(paris_size_scenario$VOTE) <- colnames(VOTE)

paris_size_scenario_gg <- melt(
  data = data.table(paris_size_scenario),
  id.vars = "POP_DENSITY",
  measure.vars = make.names(paste0("VOTE.",colnames(VOTE))))


unmake_votenames <- structure(colnames(VOTE), names = make.names(paste0("VOTE.",colnames(VOTE))))
paris_size_scenario_gg[,variable := factor(unmake_votenames[variable],unmake_votenames)]
ggplot(paris_size_scenario_gg) +
  geom_vline(xintercept = paris$POP_DENSITY) +
  geom_line(aes(x = POP_DENSITY, y = value, col = variable, lty = variable)) +
  scale_x_continuous(trans = "log10") +
  scale_y_continuous(labels = scales::percent) +
  labs(y = "vote share", x = "population density", col = "candidate\nor block", lty = "candidate\nor block") +
  theme_bw()

ggsave(here("out/figures","y_compo_variation_scatter.pdf"), height = 2, width = 7)

6.2 Interpreting the impact of an X-compositional variable

6.2.1 Elasticities difference

For X-compositional variables, the clr coefficients also correspond to differences in mean elasticities, where these means are computed for each component of \(X\) over all dependent shares.

Code
flevels2 <- c("-Mean elasticity",flevels)
clr_confint <- confint(VOTE_model, "PROFCAT")
clr_confint$X <- factor(clr_confint$X, colnames(PROFCAT))
clr_confint$Y <- factor(clr_confint$Y, levels = flevels2)
clr_confint_all <- clr_confint
clr_confint_all$Y <- factor("-Mean elasticity", flevels2)
clr_confint_all$SD <- clr_confint_all$EST
clr_confint_all$EST <- NA
clr_confint_all$`2.5 %` <- NA
clr_confint_all$`97.5 %` <- NA
clr_confint_all <- unique(clr_confint_all)
levels(ME$Y)[1] <- "-Mean elasticity"

ggplot(rbind(clr_confint,clr_confint_all)) +
  theme_bw() +
  coord_flip() +
  geom_hline(yintercept = 0) +
  geom_segment(aes(x=Y, xend=Y, y=0, yend=EST, col = Y)) +
  geom_point(aes(x=Y, y = EST, col = Y)) +
  geom_point(aes(x=Y, y = `2.5 %`, col = Y), pch = "[", stroke = 3) +
  geom_point(aes(x=Y, y = `97.5 %`, col = Y), pch = "]", stroke = 3) +
  geom_violin(aes(y = EST, x = Y),data = ME[X!="log(POP_DENSITY)"]) +
  facet_wrap("X") +
  labs(y = "parameter values & confidence intervals\ndistribution of negative mean elasticity", x = NULL) +
  theme(legend.position = "none")

ggsave(here("out/figures", "clr_B_profcat_with_details.pdf"), height = 4, width = 7)

Code
clr_confint <- confint(VOTE_model, "PROFCAT",y_ref = 1)
clr_confint$Y <- factor(clr_confint$Y,levels = flevels)
clr_confint$X <- factor(clr_confint$X,levels = colnames(PROFCAT))
ggplot(clr_confint) +
  theme_bw() +
  coord_flip() +
  geom_hline(yintercept = 0) +
  geom_segment(aes(x=Y, xend=Y, y=0, yend=DIFF, col = Y)) +
  geom_point(aes(x=Y, y = DIFF, col = Y)) +
  geom_point(aes(x=Y, y = `2.5 %`, col = Y), pch = "[", stroke = 3) +
  geom_point(aes(x=Y, y = `97.5 %`, col = Y), pch = "]", stroke = 3) +
  facet_wrap("X") +
  labs(y = "elasticity differences & confidence intervals\n(in the denominator is Macron)", x = NULL) +
  theme(legend.position = "none")

ggsave(here("out/figures", "E_diff_profcat_with_details.pdf"), height = 3.5, width = 7)

6.2.2 Variation table

Similar to the previous variation table, we evaluate how the voting behavior reacts to a slight change in the explanatory composition. However, since this variable is multivariate, we must specify the change direction. For a change towards the vertex “Upper” we get the following:

Code
VariationTable4PCparis <- function(Xdir) VariationTable(
  object = VOTE_model,
  Xvar = "PROFCAT",
  Xdir = Xdir,
  inc_size = .1,
  obs = paris_num,
  normalize_Xdir = FALSE,
  Ytotal = mun2model$REGISTERED[paris_num])

paris_varTab <- VariationTable4PCparis(Xdir = "Upper")
tab_atr <- attributes(paris_varTab)
paris_varTab <- format_VarTab(paris_varTab, 3, 0,digitsP = 2,digitsPP = 2)
paris_varTab |> 
  kable(caption = "Change of (PC) in direction of the vertex \"Upper\"", digits = 4, booktabs = TRUE) |> 
    add_footnote(sprintf("h = %s, alpha=%s%%", tab_atr$inc_size, round(100*tab_atr$inc_rate,2)[1]))  
Change of (PC) in direction of the vertex “Upper”
Macron Le Pen Mélenchon Left Right No vote
Initial parts 0.250 0.158 0.173 0.083 0.088 0.248
New parts 0.251 0.157 0.174 0.084 0.088 0.246
Elasticity 0.060 -0.074 0.059 0.087 0.013 -0.087
Variation in % 0.6 % -0.74 % 0.59 % 0.87 % 0.12 % -0.87 %
Variation in % points 0.15 % -0.12 % 0.1 % 0.07 % 0.01 % -0.22 %
Variation in units 2038 -1601 1395 981 151 -2964

Note: ah = 0.1, alpha=5.58%

For the general direction, we get the following:

Code
paris_varTab <- VariationTable4PCparis(dir_gen)
paris_varTab <- format_VarTab(paris_varTab, 3, 0,digitsP = 2,digitsPP = 2)
paris_varTab |> 
  kable(caption = "Change of PC in a general direction", digits = 4, booktabs = TRUE)
Change of PC in a general direction
Macron Le Pen Mélenchon Left Right No vote
Initial parts 0.250 0.158 0.173 0.083 0.088 0.248
New parts 0.248 0.162 0.171 0.082 0.087 0.250
Elasticity -0.072 0.246 -0.136 -0.136 -0.088 0.087
Variation in % -0.72 % 2.46 % -1.36 % -1.36 % -0.88 % 0.87 %
Variation in % points -0.18 % 0.39 % -0.23 % -0.11 % -0.08 % 0.22 %
Variation in units -2459 5325 -3215 -1537 -1056 2941

6.2.3 Variation scenarios

Let us turn to the changes induced by the variation of the socio-professional composition in Paris. Here, we show the changes in the explanatory composition on top of the changes in the predicted shares. The graphic below displays the changes as a function of the category “Upper”.

Code
paris_pred1$VOTE <- as(predict(VOTE_model, space = "simplex", newdata = paris_pred1),"matrix")
paris_pred2$VOTE <- as(predict(VOTE_model, space = "simplex", newdata = paris_pred2),"matrix")
unmake_pc_vote_names <- c(
  structure(paste0("(Vote) ", unmake_votenames), names = names(unmake_votenames)),
  structure(paste0("(PC) ", colnames(PROFCAT)), names = paste0("PROFCAT.", colnames(PROFCAT))))
legend_title <- "candidate or block\n(top six)\n\nprofessional category\n(bottom four)"

paris_pred1_gg <- melt(
  data = data.table(paris_pred1),
  id.vars = c("H_SEQ","F_SHARE","PRED_TURNOUT"),
  measure.vars = names(unmake_pc_vote_names))

paris_pred2_gg <- melt(
  data = data.table(paris_pred2),
  id.vars = c("H_SEQ","F_SHARE","PRED_TURNOUT"),
  measure.vars = names(unmake_pc_vote_names))

paris_pred_gg <- rbind(
  cbind("DIR" = "Direction to vertex", paris_pred1_gg),
  cbind("DIR" = "General direction", paris_pred2_gg))

paris_pred_gg[,COMPO := as.character(variable)]
paris_pred_gg[,SAHRE_XY := factor(unmake_pc_vote_names[COMPO], unmake_pc_vote_names)]
paris_pred_gg[,COMPO := fcase(grepl("^VOTE.", variable), "Changes in Y",
                              grepl("^PROFCAT.", variable), "Changes in X")]
paris_pred_gg[,COMPO := factor(COMPO, levels = c("Changes in Y", "Changes in X"))]

hline <- data.frame(
  INT = c(paris$PROFCAT[c(1,1), "Upper"],.25),
  DIR = c("Direction to vertex","General direction", "General direction"))

pclim <- c(.05, .95)

ggplot(paris_pred_gg) +
  geom_vline(aes(xintercept = INT), data = hline[1:2,]) +
  geom_line(aes(y = value, x = F_SHARE, col =SAHRE_XY, lty = SAHRE_XY)) +
  scale_y_continuous(breaks = seq(1,9, by = 2)/10, labels = scales::percent) +
  scale_x_continuous(breaks = seq(1,9, by = 2)/10, labels = scales::percent) +
  theme_bw() +
  labs(y = "shares of\nprofessional categories (bottom) and predicted votes (top)",
       x = "share of professional category Upper",
       col = legend_title, lty = legend_title) +
  facet_grid(rows = vars(COMPO), cols = vars(DIR), shrink = TRUE, scales = "free_y")

Code
ggsave(here("out/figures", "yx_compo_variation_scatter.pdf"), height = 4.5, width = 7)

We can draw the same graphic as a function of the value of \(h\).

Code
ggplot(paris_pred_gg[,]) +
  geom_vline(aes(xintercept = B), data = cbind(hline[1:2,],B = 0)) +
  geom_line(aes(y = value, x = as.integer(H_SEQ)/10, col =SAHRE_XY, lty = SAHRE_XY)) +
  scale_y_continuous(breaks = seq(1,9, by = 2)/10, labels = scales::percent) +
  # scale_x_continuous(breaks = seq(1,9, by = 2)/10, labels = scales::percent) +
  theme_bw() +
  labs(y = "shares of\nprofessional categories (bottom) and predicted votes (top)",
       x = "value of h",
       col = legend_title, lty = legend_title) +
  facet_grid(rows = vars(COMPO), cols = vars(DIR), shrink = TRUE, scales = "free_y", space = "free_y")

Code
ggsave(here("out/figures", "yx_compo_variation_scatter_by_h.pdf"), height = 4.5, width = 7)

We could also illustrate this scenario in a matrix of ternary diagrams. However, in this representation, we are limited to subcompositions of three components.

Code
opar <- par(mfrow = c(2,2), mai = c(.1,0,.2,0))
filter1 <- paris_pred1$H_SEQ[between(paris_pred1$F_SHARE, pclim[1], pclim[2])]
filter1 <- as.numeric(filter1)
filter1 <- min(max(-filter1), max(filter1))
filter1 <- intersect(paris_pred1$H_SEQ, seq(-filter1,filter1))
filter1 <- as.character(seq(-25,25))
colors1 <- diverging_hcl(length(filter1),palette = "Blue-Red-3")


plot(acomp(paris_pred1$PROFCAT[filter1,1:3]),  col = colors1, pch = 19)
title("Change X (direction to vertex)")
plot(acomp(paris_pred2$PROFCAT[filter1,1:3]),  col = colors1, pch = 19)
title("Change X (general direction)")
plot(acomp(paris_pred1$VOTE[filter1,1:3]),  col = colors1, pch = 19)
plot(acomp(paris_pred2$VOTE[filter1,1:3]),  col = colors1, pch = 19)
par(opar)

6.2.4 Direction of maximal share ratio variation

For each dependent share ratio, the direction of maximal variation is given by the squared norm of the difference vector for two rows of the parameter matrix \(B\), where the row indexes correspond to the indexes of the two components that appear in the share ratio. With this in mind we can examine how the share ratio react to a change in their respective maximal variation direction.

Code
B <- t(coef(VOTE_model,space = "clr", split = TRUE)[["ilr(PROFCAT)"]])
Dy <- nrow(B)

max_diffs <- expand.grid(yj = rownames(B), yl = rownames(B))
max_diffs <- max_diffs[as.integer(max_diffs$yj) > as.integer(max_diffs$yl),]
max_diffs[["diff"]] <-
  unlist(Map(function(yj, yl) sqrt(crossprod(B[as.integer(yj),] - B[as.integer(yl),])),max_diffs$yj,max_diffs$yl))

MatMaxDiff <- matrix(nrow = Dy, ncol = Dy, dimnames = list(rownames(B),rownames(B)))
MatMaxDiff[lower.tri(MatMaxDiff)] <- max_diffs$diff

ggplot(max_diffs, aes(y = yl, x = yj)) + 
  geom_tile(aes(fill = diff)) +
  geom_text(aes(label = round(diff,3), size = abs(diff)),fontface = "bold") +
  scale_size_continuous(range = c(3,4), guide = "none") +
  scale_x_discrete(position = "top") +
  scale_y_discrete(limits=rev) +
  scale_fill_continuous_sequential("Reds") +
  labs(x = "", y = "") +
  guides(fill = guide_colorbar(barwidth = .5, barheight = 10,title =  "value")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 15)) +
  coord_equal()

Code
ggsave(here("out/figures", "yx_compo_maxdiff_matrix.pdf"), height = 3, width = 4.5)

The above shows that the professional categories is most useful to discriminate Le Pens share from that of the others. Let us now look at the variation scenario in the direction of maximal variation for the shares of the “Left” and “Le Pen”, which is the strongest opposition in our example. Additionally, we look at “No Vote” vs “Le Pen”, which is the weakest opposition that invloves “Le Pen” and “Right vs Mélenchon”, which is the overall weakest opposition.

Code
Mdiff_VS <- function(den = "Left", num = "Le Pen") {
  max_diff_dir <- exp(B[num,] - B[den,])
  max_diff_dir <- max_diff_dir/sum(max_diff_dir)
  max_diff_scenario <- VariationScenario(VOTE_model,Xvar = "PROFCAT", Xdir = max_diff_dir)  
  
  max_diff_scenario$H_SEQ   <- rownames(max_diff_scenario)
  max_diff_scenario$VOTE    <- max_diff_scenario$Y
  max_diff_scenario$PROFCAT <- max_diff_scenario$X
  max_diff_scenario$SR      <- max_diff_scenario$Y[,num] /max_diff_scenario$Y[,den]
  
  
  
  unmake_names <- paste0(num, " / ", den)
  unmake_names <- "(SR) Share ratio"
  unmake_names <- c("SR" = unmake_names, unmake_pc_vote_names)
  max_diff_scenario_gg <- melt(
    data = data.table(max_diff_scenario[,c("H_SEQ","VOTE","PROFCAT","SR")]),
    id.vars = c("H_SEQ"),
    measure.vars = names(unmake_names))

  max_diff_scenario_gg <- 
    cbind(max_diff_scenario_gg, "DIR" = paste0(num, " vs. ", den))
  
  max_diff_scenario_gg[,COMPO := as.character(variable)]
  max_diff_scenario_gg[,SAHRE_XY := factor(unmake_names[COMPO], unmake_names)]
  max_diff_scenario_gg[,COMPO := fcase(
    variable == "SR", "Share ratio",
    grepl("^VOTE.", variable), "Changes in Y",
    grepl("^PROFCAT.", variable), "Changes in X")]
  max_diff_scenario_gg[,COMPO := factor(COMPO, levels = c("Share ratio", "Changes in Y", "Changes in X"))]
  max_diff_scenario_gg
}

max_diff_scenario_gg <- rbind(
  Mdiff_VS(den = "Left",    num = "Le Pen"),
  Mdiff_VS(den = "No vote", num = "Le Pen"),
  Mdiff_VS(num = "Mélenchon", den = "Right"))

legend_title3 <- "share ratio (SR)\n\ncandidate or block (Vote)\n\nprofessional category (PC)"
ggplot(max_diff_scenario_gg) +
  geom_vline(aes(xintercept = 0)) +
  geom_line(aes(y = value, x = as.integer(H_SEQ)/10, col =SAHRE_XY, lty = SAHRE_XY)) +
  scale_colour_discrete_qualitative() +
  theme_bw() +
  labs(y = "X-shares (bottom), Y-shares (middle), Y-share ratio (top)",
       x = "value of h",
       col = legend_title3, lty = legend_title3) +
  facet_grid(rows = vars(COMPO), cols = vars(DIR), scales = "free") +
  theme(legend.key.width = unit(3, "line"))

Code
ggsave(here("out/figures", "yx_compo_variation_scatter_by_h_maxdiff.pdf"), height = 4.5, width = 7)

The direction of changes in X can be easily computed from the parameter matrix:

Code
clrInv2 <- function(x) exp(x)/sum(exp(x))
uStar <- function(u) round(clrInv2(u/sqrt(sum(u^2))),3)

rbind(data.frame("Y-ratio" =  "Le Pen vs Left", t(uStar(B["Le Pen", ] - B["Left",]))),
      data.frame("Y-ratio" =  "Le Pen vs No Vote", t(uStar(B["Le Pen", ] - B["No vote",]))),
      data.frame("Y-ratio" =  "Mélenchon vs Right", t(uStar(B["Mélenchon",] - B["Right", ])))) 
Y.ratio Upper Workers Retired Other
Le Pen vs Left 0.144 0.499 0.151 0.206
Le Pen vs No Vote 0.231 0.482 0.133 0.153
Mélenchon vs Right 0.510 0.149 0.181 0.160

6.2.5 Share ratio elasticity tables

Another way to interpret covariate impacts in XY-compositional models is to consider share ratio elasticities. These measure the % change in the ratio of two components of Y relative to the % change in the ratio of two components of X.

Code
sre_matrix_plot <- function(sre_tab, psize = 5, bin_centers) {
  
  sre_tab$Yr <- paste0(as.character(sre_tab$Yj), " / ",as.character(sre_tab$Yl))
  sre_tab$Xr <- paste0(as.character(sre_tab$Xj), " / ",as.character(sre_tab$Xl))
  sre_tab$Yr <- factor(sre_tab$Yr, (unique(sre_tab$Yr)))
  sre_tab$Xr <- factor(sre_tab$Xr, rev(unique(sre_tab$Xr)))
  Dy <- length(unique(sre_tab$Yj))
  Dx <- length(unique(sre_tab$Xj))
  
  Xblocks <- seq(Dx-1) * (Dx - 1) + .5
  Yblocks <- seq(Dy-1) * (Dy - 1) + .5

  if (missing(bin_centers)) {
    p <- ggplot(sre_tab) +
      geom_point(aes(x = Yr, y = Xr, col = SRE, pch = SRE > 0), size = psize) +
      coord_fixed() +
      geom_hline(yintercept = Xblocks) +
      geom_vline(xintercept = Yblocks) +
      scale_color_continuous_diverging(na.value = "white", rev = TRUE) +
      scale_shape_manual(breaks = c(T,F), values = c(18,20), labels = c("postive", "negative")) +
      labs(x = "share ratios of Y", y = "share ratios of X", col = "share ratio\nelasticity", shape = "sign") +
      theme_bw()  +
      guides(color = guide_colorbar(order = 1)) +
      theme(axis.text.x = element_text(angle = 90,hjust = 1, vjust = .5),
            legend.justification = c(0,1))
  }
  
  if (!missing(bin_centers)) {
    sre_tab$SREB <- sre_tab$SRE
    sre_tab$SREB[!is.finite(sre_tab$SREB)] <- NA
    if (is.null(bin_centers)) {
      breaks <- max(abs(sre_tab$SREB), na.rm = TRUE)
      breaks <- seq(from = -breaks, to = breaks, length.out = 12)
      bin_centers <- breaks[-1] - diff(breaks)/2
    } 
    
    if (is.numeric(bin_centers)) {
      bin_centers <- sort(unique(bin_centers))
      breaks <- bin_centers[-1] + diff(bin_centers)/2
      breaks <- c(bin_centers[1:2] - diff(bin_centers[1:3])/2, breaks)
      
      check <- "Bins should contain minimum and maximum SRE values: %s"
      if (min(breaks) > min(sre_tab$SREB, na.rm = TRUE)) stop(sprintf(check, range(sre_tab$SREB, na.rm = TRUE)))
      if (max(breaks) < max(sre_tab$SREB, na.rm = TRUE)) stop(sprintf(check, range(sre_tab$SREB, na.rm = TRUE)))

    }
    
    sre_tab$SREB <- cut(sre_tab$SREB,breaks,labels = bin_centers, include.lowest = TRUE)
    nbins <- length(bin_centers)
    nnegs <- sum(bin_centers < 0)
    
    bin_cols   <- diverge_hcl(palette = "Blue-Red 3", n = nbins, rev = TRUE)
    bin_cols[nnegs + 1] <- "grey"
    
    bin_sizes  <- sqrt(abs(seq(-psize, psize, length.out = nbins))) + 1
    bin_sizes  <- bin_sizes/max(bin_sizes)*psize
    
    bin_shapes <- unlist(Map("rep",c(20,18),c(nbins - nnegs,nnegs)))
    
    LT <- "share ratio\nelasticity\n± "
    LT <- paste0(LT,(bin_centers[2] - bin_centers[1])/2)
    
    p <- ggplot(sre_tab) +
      geom_point(aes(x = Yr, y = Xr, col = SREB, pch = SREB, size = SREB)) +
      coord_fixed() +
      geom_hline(yintercept = Xblocks) +
      geom_vline(xintercept = Yblocks) +
      scale_color_manual(breaks = rev(bin_centers), values = bin_cols, na.value = "white", drop=FALSE) +
      scale_size_manual( breaks = rev(bin_centers), values = bin_sizes, drop=FALSE) +
      scale_shape_manual(breaks = rev(bin_centers), values = bin_shapes, drop=FALSE) +
      labs(x = "share ratios of Y", y = "share ratios of X",
           col = LT, shape = LT, size = LT) +
      theme_bw()  +
      guides(color = guide_legend()) +
      theme(axis.text.x = element_text(angle = -75,hjust = 1, vjust = .5),
            legend.justification = c(0,0)) + 
      scale_x_discrete(position = "top")
  }
  return(p)
}

6.2.5.1 Fixed vertex directions

The four tabs below illustrate the share ratio elasticities for the case where the direction is fixed towards one of the vertices of the PC composition. The blank rows in the graphic are because some share ratios do not change for this particular direction.

Code
sre_tab <- ShareRatioElasticities(VOTE_model,Xvar = "PROFCAT", Xdir = "Upper")
sre_matrix_plot(sre_tab, bin_centers = seq(-175,175,length.out = 11)/1000)

Code
sre_tab <- ShareRatioElasticities(VOTE_model,Xvar = "PROFCAT", Xdir = "Workers")
sre_matrix_plot(sre_tab, bin_centers = seq(-10,10, by = 2)*0.031)

Code
sre_tab <- ShareRatioElasticities(VOTE_model,Xvar = "PROFCAT", Xdir = "Other")
sre_matrix_plot(sre_tab, bin_centers = seq(-10,10, by = 2)*0.0085)

Code
sre_tab <- ShareRatioElasticities(VOTE_model,Xvar = "PROFCAT", Xdir = "Retired")
sre_matrix_plot(sre_tab, bin_centers = seq(-135,135,length.out = 11)/1000)

6.2.5.2 Varying Compensating directions

While the previous charts used the same direction for all share ratio elasticities, we can also change them as a function of the components used in the share ratio of X. The compensating direction is particularly interesting for this purpose because the changes in the Y composition are entirely explained by changes in the two compensating components of the X composition.

Code
sre_tab <- ShareRatioElasticities(VOTE_model,Xvar = "PROFCAT")
sre_matrix_plot(sre_tab, bin_centers = seq(-10,10,by = 2)*.0225)
ggsave(here("out/figures/yx_compo_share_ratio_matrix.pdf"),width = 7, height = 3.5)

The values of the share-ratio elasticities shown above can be understood as a decomposition of the original clr parameters. More precisely, it is possible to show that, within any of the 24 blocks summing over all 15 elasticity values, we obtain \(0.5 D_y D_x B^*\), a matrix proportional to the clr parameter values.

Code
B    <- coef(VOTE_model,"clr",split = TRUE)[["ilr(PROFCAT)"]]
B_df <- data.table(B,keep.rownames = TRUE)
B_df <- melt(B_df,id.vars = "rn")
B_df$Y <- factor(B_df$variable,levels = colnames(B))
B_df$X <- factor(B_df$rn,levels = rev(rownames(B)))
setDT(B_df)

setDT(sre_tab)
B_df2 <- sre_tab[,list(value2 = sum(SRE) / ncol(B) / nrow(B) * 2), by = c("Yl", "Xl")]
B_df2[B_df, value := i.value, on = c("Yl" = "Y", "Xl" = "X")]
B_df2
Yl Xl value2 value
Macron Upper 0.0501914 0.0501914
Le Pen Upper -0.0833846 -0.0833846
Mélenchon Upper 0.0495960 0.0495960
Left Upper 0.0771850 0.0771850
Right Upper 0.0031394 0.0031394
No vote Upper -0.0967272 -0.0967272
Macron Workers -0.0393279 -0.0393279
Le Pen Workers 0.2346584 0.2346584
Mélenchon Workers -0.0877041 -0.0877041
Left Workers -0.0887457 -0.0887457
Right Workers -0.0673250 -0.0673250
No vote Workers 0.0484443 0.0484443
Macron Retired 0.0238771 0.0238771
Le Pen Retired -0.1054013 -0.1054013
Mélenchon Retired 0.0120279 0.0120279
Left Retired 0.0370594 0.0370594
Right Retired 0.0216541 0.0216541
No vote Retired 0.0107828 0.0107828
Macron Other -0.0347406 -0.0347406
Le Pen Other -0.0458725 -0.0458725
Mélenchon Other 0.0260802 0.0260802
Left Other -0.0254987 -0.0254987
Right Other 0.0425316 0.0425316
No vote Other 0.0375001 0.0375001

The visual impression blow of parameter matrix \(B\), confirms the link to the more detailed share ratio representation above.

Code
ggplot(B_df, aes(y = X, x = Y)) + 
  geom_tile(aes(fill = value)) +
  geom_text(aes(label = round(value,3), size = abs(value)),fontface = "bold") +
  scale_size_continuous(range = c(3,4), guide = "none") +
  scale_x_discrete(position = "top") +
  scale_fill_continuous_diverging(palette = "Blue-Red 3") +
  labs(x = "", y = "") +
  guides(fill = guide_colorbar(barwidth = .5, barheight = 15)) +
  theme_minimal() +
  coord_equal()

ggsave(here("out/figures/clr_B_matrix_image.pdf"))
Saving 7 x 5 in image

Acknowledgements

The example used to illustrate this work is taken from a Statistical Consulting project class of the Master in Data Science for Social Sciences of The Toulouse School of Economics, in cooperation with the Market Research agency BVA. The project was a great experience, and we want to thank all the participants, namely Olivier Hennebelle and Alejandro Lara, who advised the project from the side of BVA, and our four students, Claire Lebrun, Malo Bert, Kyllian James and Gael Charrier.